updated: 2022-01-23_12:32:31-05:00
Chomsky Normal Form
Guarantied to resolve a Push Down Automata in (2n-1)
steps
For some ambiguous grammar we can find unambiguous grammar that generates the same language (ie trompsky normal)
Some Context Free Languages can only be generated using ambiguous grammar. Such languages are called Inherently ambiguous and cannot be put in trompsky form
CNF:
A grammar is in CNF if every rule is of the form:
A->BC
A->a
Where a is any terminal and A B & C are any variable
also, s->e is permitted where s is the start variable
Four Steps to Converting:
-
Add a new start variable $S_0$
and a new rule $S_0 \rightarrow S$ where S is the original start variable
This guarantees that the start variable is not on RHS -
Eliminate e rules
Remove an e rule $A\rightarrow$e where A is not the start variable
If $R \rightarrow UAV$ where U and V are strings of variables and terminals
we add the rule $R\rightarrow UV$
We do this for every occurance of AEg: R -> uAvAw so we add...
R -> uvAw
R -> uAvw
R -> uvw -
Remove unit rules $A \rightarrow B$
Remove unit rule then whenever rule $B\rightarrow u$ appears, add $A\rightarrow u$
where u is a string of variables and terminals
Unit rules is a variable going to a single variable (not a terminal) -
Convert all remaining rules to proper form
Replace $A\rightarrow u_1 , u_2 , ... u_k$ where k $\geq$ 3
Until each $u_i$ is a variable or terminal symbol with rules:
$A\rightarrow u_1A_1$
$A\rightarrow u_2A_2$
...
$A\rightarrow u_kA_k$We replace any terminal $u_i$ in the preceding rules with the new variable $U_i$ and add rule:
$U_i\rightarrow u_i$
Example: converting to CNF
S -> A S A | a B
A -> B | S
B -> b | e
what's wrong?
B going to epsilon
S goes to ASA (more than 2)
S goes to aB (variable and terminal in same rule)
A goes to B | S (both unit rules)
.
.
- New start symbol $S_0$
S0 -> S
S -> ASA | aB
A -> B | S
B -> b | e
.
.
2. Remove e rules B -> e
S0 -> S
S -> ASA | aB | a
A -> B | S | e
B -> b
(add e transitions)
Now we gotta finish, adding the possibilities for ASA (where A can be e)
S0 -> S
S -> ASA | AS | SA | S | aB | a
A -> B | S
B -> b
.
.
3. Unit Rules (remove s->s)
S0 -> S
S -> ASA | AS | SA | aB | a
A -> B | S
B -> b
(remove s0->S)
S0 -> ASA | AS | SA | aB | a
S -> ASA | AS | SA | aB | a
A -> B | S
B -> b
(remove A->B)
S0 -> ASA | AS | SA | aB | a
S -> ASA | AS | SA | aB | a
A -> b | S
B -> b
(remove A->S)
S0 -> ASA | AS | SA | aB | a
S -> ASA | AS | SA | aB | a
A -> b | ASA | AS | SA | aB | a
B -> b
At this point:
* marks non compliant bits
S0 -> *ASA | AS | SA | *aB | a
S -> *ASA | AS | SA | *aB | a
A -> b | *ASA | AS | SA | *aB | a
B -> b
Let's add some rules:
A1 -> AS
U -> a
This becomes:
S0 -> A1 A | AS | SA | UB | a
S -> A1 A | AS | SA | UB | a
A -> b | A1 A | AS | SA | UB | a
B -> b
Don't convert the single a's, because that would violate unit rules
CNF
$\therefore$ (2n-1) steps for string of length n